System and method for searching deep web services

ABSTRACT

A system and method for searching deep web services are provided. The system and method in one aspect allow organizing communities, sources and schema attributes in a multi-tier containment relationship; searching representative schema attributes in one or more communities; searching representative services in one or more communities; searching for related schema attributes; and searching for related communities.

RELATED APPLICATIONS

This application is a continuation of U.S. Ser. No. 11/503,754, filedAug. 14, 2006, the entire contents of which are incorporated herein byreference.

FIELD OF THE INVENTION

The present disclosure relates to searching and exploring deep webservices or databases on the World Wide Web (web). More specifically, itrelates to the application of metadata (or schema attributes) used toinvoke and query deep web services and search methods to associate deepweb services, their schema attributes and the communities to which theybelong.

BACKGROUND OF THE INVENTION

Deep web describes web sites, which offer dynamically generated webpages in response to an end-user submitted query. For example, US PostalService (USPS) offers shipment tracking on the web by allowing end-usersto submit tracking numbers. The tracking number is processed by lookingup its entry in the USPS database for the time and location of thereferenced shipment. Similarly, at the Hertz web site, an end-user cansubmit a query for a specific date, location and car model to check theavailability of vehicles for rent. This information is also composed ofquery results of the Hertz database. Both USPS and Hertz offer deep webservices, in addition to statically linked web pages for genericinformational content.

It has been observed that deep web services are increasingly beingoffered to allow business activities and commerce transactions on theweb. A common theme is that they use web forms for users to fill andsubmit formatted queries. As in the previous examples, USPS asks fortracking number while Hertz asks for date, location and vehiclecategory. Unlike static web pages, these web forms make it verydifficult for search engine robots to crawl the backend databases. Sincesearch engines do not find deep web content, deep web sometimes isreferred to as the hidden web.

As deep web services proliferate, it becomes critical to understand,organize and search them.

BRIEF SUMMARY OF THE INVENTION

A system and method for searching deep web services is provided. In oneaspect, the system and method provide for exploring the relationshipsbetween deep web communities, sources and schema attributes. The methodof searching deep web services in one aspect includes searching aplurality of deep web service sources in one or more deep web servicecommunities, looking-up one or more schema attributes associated witheach of the plurality of deep web services sources, incrementing a countassociated with each of the one or more schema attributes, and returninga predetermined K number of most frequently occurring schema attributesbased on the count associated with each of the one or more schemaattributes. In one aspect, the method may take as input either a sourceor a community. If the input is a source, the method finds the communitythat the source belongs to and performs the search.

A system for searching deep web services in one aspect includes aprocessor operable to search a plurality of deep web service sources inone or more deep web service communities. The processor is furtheroperable to look-up one or more schema attributes associated with eachof the plurality of deep web service sources. The processor is alsooperable to increment a count associated with each of the one or moreschema attributes and to return a predetermined K number of mostfrequently occurring schema attributes based on the count associatedwith each of the one or more schema attributes. The system and methodmay enable a user such as a software developer to find representativedeep web services.

Further features as well as the structure and operation of variousembodiments are described in detail below with reference to theaccompanying drawings. In the drawings, like reference numbers indicateidentical or functionally similar elements.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a graphical illustration of the organization of deep webcommunities, sources and their schema attributes in one embodiment ofthe present disclosure.

FIG. 2 is a flow diagram illustrating a method of searching forrepresentative schema attributes in a service community in oneembodiment of the present disclosure.

FIG. 3 is a flow diagram illustrating a method of searching forrepresentative services in a service community in one embodiment of thepresent disclosure.

FIG. 4 is a flow diagram illustrating a method of searching for relatedschema attributes in one embodiment of the present disclosure.

FIG. 5 is a flow diagram illustrating a method of searching for relatedservice communities in one embodiment of the present disclosure.

DETAILED DESCRIPTION

One or more search methods applicable to deep web services are provided.The methods of the present disclosure in an exemplary embodiment takeadvantage of co-occurrence of metadata or schema attributes in the webforms. It is to be appreciated that the term “deep web service” as usedherein is intended to include any dynamic web pages that are invoked byfilling some or all fields and by clicking the submit button on thepage. Typically, a business or individual web site can contain one ormore such dynamic web pages. An end-user is interested in providing theinformation requested on the web form in order to gain access to thebackend database or information sources. An example is the airlineflight status page, in which a user enters the flight number andretrieves the flight status.

The term “deep web service community” as used herein is intended todescribe a group of deep web services, which may be formed by manualselection, or automated data mining process as described in theco-pending application entitled A Method and Apparatus for OrganizingData Sources assigned to the same assignee (docket number:YOR920060232US1) filed on ______. The term “deep web service source” asused herein describes a single dynamic web page or web form. The terms“schema attribute” or “schema metadata” refer to the semantic definitionof a field on the web form. The actual attribute values are entered byusers of the form. For example, flight number is a schema attributewhile AA31, UA54, DL42 are valid values for this attribute.

FIG. 1 illustrates the organization and relationships among communities,sources, and attributes in one embodiment of the present disclosure.FIG. 1 illustrates two communities (100) and (102). A community (100)contains two sources (104) and (106). A containment relationship isillustrated by an arrow (108) pointing from (104) to (100). A source(104) has two attributes (110) and (112). Similarly, another containmentrelationship is illustrated by an arrow (114) pointing from (110) to(104). The same source can fall in one or more communities due todifferent grouping criteria that are user-defined and are not in thescope of this invention. For example, the source (106) belongs to twocommunities (100) and (102). Similarly, attributes with the samesemantic meanings can be associated with multiple sources. The attribute(112) is associated with sources (104), (106) and (116).

FIG. 1 may be best understood with an example. All airline reservationservices may be grouped in one community and all hotel reservationservices in another community. Airline reservation services includeairline web sites and online travel agencies. They typically ask fordeparture and arrival cities as well as dates. Hotel reservationservices include hotel web sites and online travel agencies. They askfor city and check-in/check-out dates. Some services appear in bothcommunities, since quite often both reservations are requested at thesame time. Examples of shared attributes may include attributes such asdates and destination city.

The creation of the organization as illustrated in FIG. 1 is not in thescope of this invention. Rather it may be created manually or using thedata mining method as described in the co-pending application entitled AMethod and Apparatus for Organizing Data Sources. FIG. 2 to FIG. 5illustrate four search methods based on the organization of deep webservices. FIG. 2 illustrates a method of searching for representativeattributes in one or more communities in one embodiment of the presentdisclosure. Step 200 resets all attributes' counts to zero. Steps 202,204 and 206 iterate over every source in the communities and incrementcounts of the attributes found. At 202, sources in one or morecommunities are searched. At 204, attributes for each source aresearched. At 206, for each attribute, its count is incremented. Step 208returns the K most appeared (also most shared) attributes. In oneembodiment, the value K may be predetermined, for example, user definedbased on specific cases.

FIG. 3 illustrates a method of searching for representative sources in acommunity in one embodiment of the present disclosure. Step (300) is thesame as in FIG. 2 to identify the K most appeared attributes. Step (302)counts the number of matched attributes for a source. If more than Lmatches were found, step (304) returns the source. Those returnedsources are representative because their web forms apply the mostcommonly used attributes in the community. In one embodiment, the valueL may be predetermined, for example, user defined based on specificcases.

FIG. 4 illustrates a method of searching for related schema attributesin one embodiment of the present disclosure. For example, title, authorand ISBN often appear together. Given user specified attributes (400),sources are iterated (402) to check if the specified ones appear (404).If they do, the rest of the attributes unspecified are recorded (406).In the end, the K most frequently appeared attributes occurring with theuser specified attributes are returned. Such occurrences suggestaffinity among attributes that are used to execute a web service task.At 400, a user specifies one or more attributes. At 402, sources in oneor more communities are searched. At 404, for each source, itsattributes are determined. At 406, if the specified attributes are foundin the set of attributes, increment the count of all other attributesappearing in the source. At 408, the M most frequently co-occurredattributes are returned. The value of M may be a predetermined, forinstance, user defined value.

Similarly, one may measure affinity among communities by counting thenumber of shared attributes. FIG. 5 is a flow diagram illustrating amethod of searching for related service communities in one embodiment ofthe present. At 500, user specifies a community. At 502, the K mostfrequently occurring attributes are searched for each community. At 504,the number of representative attributes (502) that are shared betweenthe user-specified community and the searched community is determined.If the count is greater than L, the two communities may be treated asbeing strongly affiliated and one or more communities from the searchedcommunicates with count greater than N is returned at 506. The value ofN may be a predetermined, for instance, user defined value.

The system and method of the present disclosure may be implemented andrun on a general-purpose computer or computer system. The computersystem may be any type of known or will be known systems and maytypically include a processor, memory device, a storage device,input/output devices, internal buses, and/or a communications interfacefor communicating with other computer systems in conjunction withcommunication hardware and software, etc.

The terms “computer system” as may be used in the present applicationmay include a variety of combinations of fixed and/or portable computerhardware, software, peripherals, and storage devices. The computersystem may include a plurality of individual components that arenetworked or otherwise linked to perform collaboratively, or may includeone or more stand-alone components. The hardware and software componentsof the computer system of the present application may include and may beincluded within fixed and portable devices such as desktop, laptop, andserver.

The embodiments described above are illustrative examples and it shouldnot be construed that the present invention is limited to theseparticular embodiments. Thus, various changes and modifications may beeffected by one skilled in the art without departing from the spirit orscope of the invention as defined in the appended claims.

1. A method of searching deep web services, comprising: searching aplurality of deep web service sources in one or more deep web servicecommunities; looking-up one or more schema attributes associated witheach of the plurality of deep web services sources; incrementing a countassociated with each of the one or more schema attributes; and returninga predetermined K number of most frequently occurring schema attributesbased on the count associated with each of the one or more schemaattributes.
 2. The method of claim 1, further including: receiving atleast one of a source and a community as an input for searching; and ifthe input is a source, finding a deep web service community to which thesource belongs and the step of searching includes searching a pluralityof deep web service sources in the deep web service community.
 3. Themethod of claim 1, further including: searching for the predetermined Knumber of most frequently occurring schema attributes in a plurality ofdeep web service sources in one or more deep web service communities;determining for each of the plurality of deep web service sources, anumber of occurrences of the predetermined K number of most frequentlyoccurring schema attributes; and returning one or more deep web servicesources that are determined to have more than a predetermined L numberof occurrences of one or more of the predetermined K number of mostfrequently occurring schema attributes.
 4. The method of claim 3,further including: receiving at least one of an attribute and keyword asan input for searching; and finding a deep web service community towhich the attribute or the keyword belongs, and the step of searchingincludes searching for the predetermined K number of most frequentlyoccurring schema attributes in a plurality of deep web service sourcesin the deep web service community.
 5. The method of claim 1, furtherincluding: receiving at least one of a source, an attribute, and akeyword as an input for searching; and finding a deep web servicecommunity to which the source, the attribute, or the keyword belongs. 6.The method of claim 1, further including: receiving a user specifiedschema attribute; if the user specified schema attribute is found in aset of schema attributes associated with each of the plurality of deepweb service sources, incrementing one or more co-occurrence countsassociated with one or more schema attributes in the set; and returninga predetermined M number of most frequently occurring co-occurredattributes based on the one or more co-occurrence counts.
 7. The methodof claim 1, further including: receiving a user specified community; foreach deep web service community, determining a count of schemaattributes matching one or more schema attributes in the user specifiedcommunity; and returning one or more deep web service community havinggreater than a predetermined N number of matching schema attributes. 8.The method of claim 1, further including: receiving a user specifiedsource; for each deep web service source, determining a count of schemaattributes matching one or more schema attributes in the user specifiedsource; and returning one or more deep web service source having greaterthan a predetermined N number of matching schema attributes.
 9. A systemfor searching deep web services, comprising: a processor operable tosearch a plurality of deep web service sources in one or more deep webservice communities, the processor further operable to look-up one ormore schema attributes associated with each of the plurality of deep webservice sources, the processor further operable to increment a countassociated with each of the one or more schema attributes and to returna predetermined K number of most frequently occurring schema attributesbased on the count associated with each of the one or more schemaattributes.